HeAR

Health Acoustics Representations (HeAR) is a machine learning (ML) model that produces embeddings based on health acoustic data. The embeddings can be used to efficiently build AI models for health acoustic-related tasks (for example, identifying disease status from cough sounds, or measuring lung function using exhalation sounds made during spirometry), requiring less data and less compute than having to fully train a model without the embeddings or the pretrained model.

HeAR has been trained on 300+ million two-second audio clips comprising of five types of non-speech health acoustic events in two-second audio clips: coughing, breathing, throat clearing, laughing, and speaking.

For details about how to use the model and how it was trained, see the HeAR model card.

Common Use Cases

The following sections present some common use cases for the model. You're free to pursue any use case, as long as it adheres to the Health AI Developer Foundations terms of use.

Data-efficient classification or regression

HeAR Foundation can be used for data-efficient classification or regression tasks, including:

  • Classifying respiratory conditions like COVID-19, tuberculosis, and COPD based on cough and breath sounds
  • Identifying different types of health acoustic events, such as coughs, wheezes, and snores
  • Classifying the severity of respiratory diseases based on acoustic features
  • Measuring quantities assumed to be somewhat proportional to acoustic intensity, for example urine flow or spirometry

With a small amount of labeled data, you can train a model on top of HeAR embeddings. Furthermore, the embedding from each acoustic sample only needs to be generated once and can be used as an input for a variety of different classifiers, with very little additional compute.

For an example of how to use the model to train classifiers, see the HeAR linear classifier notebook in Colab.

Next Steps